# README

This repository contains the data and code required to replicate the results reported in the paper:
“Conspiracist attributes differentiate pro- and anti-vaccine online discourses about data.”

## Repository Structure
The analysis pipeline consists of five Python notebooks and one R notebook, which must be run sequentially. 

## Data Collection
00_collect_data_from_twitter.ipynb contains the code used to collect the original Twitter data.
The data collection produces two tweet datasets: anti-vaccine data discourse and pro-vaccine data discourse.

Due to Twitter/X data-sharing restrictions, we provide dehydrated datasets (tweet IDs only):
	•	anti_vax_ids.json
	•	pro_vax_ids.json
To proceed with variable construction, these tweet IDs must be rehydrated using the Twitter API. The fully hydrated tweet datasets are available upon request. 

The rehydrated data files should be named as anti_vax.json and pro_vax.json in order to be used in the following analysis pipeline. 

## Variable Construction
The following notebooks construct the independent and control variables used in the paper’s analyses:
	•	01_construct_variable_certainty.ipynb
	•	02_construct_variable_causal_claims.ipynb
	•	03_construct_variable_authority_figures.ipynb
	•	04_construct_control_variables.ipynb
Details are provided in the Methods section of the paper.

Important: To run 01_construct_variable_certainty.ipynb and 02_construct_variable_causal_claims.ipynb, you must obtain the LIWC2015 English dictionary, which is available from https://liwc.app. 

## Analysis Preparation
05_preparing_analysis_tables.ipynb combines the variables generated in notebooks 01–04 into analysis ready CSV files. We provide these processed datasets directly as:
	•	authority_analysis.csv
	•	causal_claims_analysis.csv
	•	certainty_analysis.csv

## Statistical Analysis
06_full_analysis.Rmd contains the R code that uses the three CSV files above to reproduce all tables and figures reported in the paper.
